53 research outputs found

    On the Construction of Prefix-Free and Fix-Free Codes with Specified Codeword Compositions

    Get PDF
    We investigate the construction of prefix-free and fix-free codes with specified codeword compositions. We present a polynomial time algorithm which constructs a fix-free code with the same codeword compositions as a given code for a special class of codes called distinct codes. We consider the construction of optimal fix-free codes which minimizes the average codeword cost for general letter costs with uniform distribution of the codewords and present an approximation algorithm to find a near optimal fix-free code with a given constant cost

    Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity

    Full text link
    Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle queries in parallel. While low adaptivity is ideal, it is not sufficient for an algorithm to be efficient in practice---there are many applications of distributed submodular optimization where the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of submodular maximization. In this paper, we give the first constant-factor approximation algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint kk that runs in O(log(n))O(\log(n)) adaptive rounds and makes O(nlog(k))O(n \log(k)) oracle queries in expectation. In our empirical study, we use three real-world applications to compare our algorithm with several benchmarks for non-monotone submodular maximization. The results demonstrate that our algorithm finds competitive solutions using significantly fewer rounds and queries.Comment: 12 pages, 8 figure

    Constrained Binary Identification Problem

    Get PDF
    We consider the problem of building a binary decision tree, to locate an object within a set by way of the least number of membership queries. This problem is equivalent to the "20 questions game" of information theory and is closely related to lossless source compression. If any query is admissible, Huffman coding is optimal with close to H[P] questions on average, the entropy of the prior distribution P over objects. However, in many realistic scenarios, there are constraints on which queries can be asked, and solving the problem optimally is NP-hard. We provide novel polynomial time approximation algorithms where constraints are defined in terms of "graph", general "cost", and "submodular" functions. In particular, we show that under graph constraints, there exists a constant approximation algorithm for locating the target in the set. We then extend our approach for scenarios where the constraints are defined in terms of general cost functions that depend only on the size of the query and provide an approximation algorithm that can find the target within O(log(log n)) gap from the cost of the optimum algorithm. Submodular functions come as a natural generalization of cost functions with decreasing marginals. Under submodular set constraints, we devise an approximation algorithm that can find the target within O(log n) gap from the cost of the optimum algorithm. The proposed algorithms are greedy in a sense that at each step they select a query that most evenly splits the set without violating the underlying constraints. These results can be applied to network tomography, active learning and interactive content search

    Compression with graphical constraints: An interactive browser

    Full text link
    Abstract—We study the problem of searching for a given element in a set of objects using a membership oracle. The membership oracle, given a subset of objects A, and a target object t, determines whether A contains t or not. The goal is to find the target object with the minimum number of questions asked from the oracle. This problem is known to be strongly related to lossless source compression. In fact, the optimum strategy is provided by Hufmman coding with the average number of questions very close to the entropy H(P) of the object set. The membership oracle aims at modelling interactive methods (i.e., incorporate human feedback) has many real life applica-tions. Due to practical constraints imposed by such applications not every subset A of objects can be queried. It is known that in general finding the optimum strategy with such constrains is NP-complete. Given this negative result we restrict attention to the cases represented by graphical models: graph G whose nodes are the database objects is given, and the queries are restricted to be those subsets A that are connected in G. We show that when G itself is connected, there is a search algorithm that finds the target in 4H(P) + 2 queries on the average. Since entropy is the trivial lower bound, our algorithm performs within a constant gap from the optimum strategy. I
    corecore